-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DocValues Support for Lucene Byte Sized Vector #953
Add DocValues Support for Lucene Byte Sized Vector #953
Conversation
f852135
to
1fcba06
Compare
Codecov Report
@@ Coverage Diff @@
## feature/lucene_byte_vector #953 +/- ##
================================================================
- Coverage 85.07% 85.07% -0.01%
- Complexity 1129 1141 +12
================================================================
Files 154 154
Lines 4616 4682 +66
Branches 412 422 +10
================================================================
+ Hits 3927 3983 +56
- Misses 499 507 +8
- Partials 190 192 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add uni tests in scripting for BYTE type. Seems that in this PR we only add FLOAT to existing test cases.
src/main/java/org/opensearch/knn/index/KNNVectorScriptDocValues.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/KNNVectorScriptDocValues.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
Will raise a separate PR to improve test coverage. Will add these unit tests in that PR |
081d157
to
4c24661
Compare
Build is failing due to this change in core opensearch-project/OpenSearch#8312 |
@naveentatikonda can you make build and test github action for this change. |
@naveentatikonda Seems like there are some changes in core.
You might need to fix it. |
Yes, the build was succeeding for this PR before opensearch core made this change. |
Sure, let me check and raise a separate PR to fix this. I thought Junqiu was working on this to fix the build on Friday. |
@naveentatikonda he was working on fixing the 2.x branch. |
|
||
@Override | ||
public float[] getValue(BytesRef binaryValue) { | ||
float[] vector = new float[binaryValue.length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a float[] for byte type? Shouldnt it be an int[]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using this method for scripting functions to retrieve the vector from docValues to calculate the score. SO, it doesn't make any difference if we return it as int[] or float[]. If we return it as int[] then again we need to do some method overloading and add methods for the spacetype functions in ScoringUtils to accept int[].
import java.util.Map; | ||
import java.util.Optional; | ||
import java.util.function.Supplier; | ||
|
||
import static org.opensearch.knn.common.KNNConstants.DEFAULT_VECTOR_DATA_TYPE_FIELD; | ||
import static org.opensearch.knn.common.KNNConstants.KNN_METHOD; | ||
import static org.opensearch.knn.common.KNNConstants.VECTOR_DATA_TYPE_FIELD; | ||
import static org.opensearch.knn.index.KNNSettings.KNN_INDEX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do changes in this file related to adding docvalues support for lucene byte sized vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here we are trying to add some extra validation checks wrt knnIndex setting and knn engine. Also, to ingest doc values for byte vectors using script scoring
} | ||
|
||
@SneakyThrows | ||
private void ingestL2ByteTestData() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be L2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually thought of creating different methods for test data for other spacetypes later. So, I created it with L2 as I'm using it for spacetype L2.
185a7a0
to
598ab43
Compare
…earch-project#962) Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
Signed-off-by: Naveen Tatikonda <[email protected]>
598ab43
to
0eb0adf
Compare
5b7c44e
into
opensearch-project:feature/lucene_byte_vector
…t#953) Signed-off-by: Naveen Tatikonda <[email protected]>
* Add Indexing Support for Lucene Byte Sized Vector (#937) * Add Indexing Support for Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add tests for Indexing Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add Querying Support to Lucene Byte Sized Vector (#956) * Add Querying Support to Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add DocValues Support for Lucene Byte Sized Vector (#953) Signed-off-by: Naveen Tatikonda <[email protected]> * Update Release Notes Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]>
* Add Indexing Support for Lucene Byte Sized Vector (#937) * Add Indexing Support for Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add tests for Indexing Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add Querying Support to Lucene Byte Sized Vector (#956) * Add Querying Support to Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add DocValues Support for Lucene Byte Sized Vector (#953) Signed-off-by: Naveen Tatikonda <[email protected]> * Update Release Notes Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> (cherry picked from commit bf04854)
* Add Indexing Support for Lucene Byte Sized Vector (#937) * Add Indexing Support for Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add tests for Indexing Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add Querying Support to Lucene Byte Sized Vector (#956) * Add Querying Support to Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add DocValues Support for Lucene Byte Sized Vector (#953) Signed-off-by: Naveen Tatikonda <[email protected]> * Update Release Notes Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> (cherry picked from commit bf04854)
* Add Indexing Support for Lucene Byte Sized Vector (#937) * Add Indexing Support for Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add tests for Indexing Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add Querying Support to Lucene Byte Sized Vector (#956) * Add Querying Support to Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add DocValues Support for Lucene Byte Sized Vector (#953) Signed-off-by: Naveen Tatikonda <[email protected]> * Update Release Notes Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> (cherry picked from commit bf04854) Signed-off-by: Naveen Tatikonda <[email protected]>
* Add Indexing Support for Lucene Byte Sized Vector (#937) * Add Indexing Support for Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add tests for Indexing Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add Querying Support to Lucene Byte Sized Vector (#956) * Add Querying Support to Lucene Byte Sized Vector Signed-off-by: Naveen Tatikonda <[email protected]> * Add CHANGELOG Signed-off-by: Naveen Tatikonda <[email protected]> * Address Review Comments Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> * Add DocValues Support for Lucene Byte Sized Vector (#953) Signed-off-by: Naveen Tatikonda <[email protected]> * Update Release Notes Signed-off-by: Naveen Tatikonda <[email protected]> --------- Signed-off-by: Naveen Tatikonda <[email protected]> (cherry picked from commit bf04854) Signed-off-by: Naveen Tatikonda <[email protected]>
Description
The changes in this PR adds doc values support for fields with byte sized vectors, which helps users to perform script scoring and painless scripting search functionalities on these indices.
Issues Resolved
#812
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.